Image Wavelet Decomposition 
and Applications 

N. Treil, S. Mallat 
R. Bajcsy 

MS-CIS-89-22 
GRASP LAB 177 


/!/ /9 S A 
/A! '■ ? 




«S***' r «•<**'" 



*S 

,y 


Department of Computer and Information Science 
School of Engineering and Applied Science 
University of Pennsylvania 
Philadelphia, PA 19104 

April 1989 


Acknowledgements: This research was supported in part by Air Force AFOSR 
F 49620-85-K-00 1 8, U.S. Army grants DAA29-84-K-0061 , DAA29-84-9-0027, N0014-85- 
K-0807, NSF grants MCS-8219196-CER, IRI84-10413-AO2, INT85-14199, DMC85-17315, NIH 
NS-1 0939-1 1 as part of the Cerebro Vascular Research Center, NIH 1-RO1-NS-23636-01 
NATO grant 0224/85, NASA NAG5-1045, ONR SB-35923-0, DARPA grant N00014-85- ’ 
K-0018, and by DEC Corporation, IBM Corporation and LORD Corporation. 

(NA$A-Cfc-1882I7) IMAGE WAVELET 
DECOMPOSITION AND APPLICATIONS 
(Pennsylvania Uni v.) 43 p 


N9 1-23787 


CSCL 098 


03/ 63 


Unci as 
0014555 



ACKNOWLEDGEMENTS. 


Nicolas Treil 

I am particularly grateful to Ruzena Bajcsy who guided me through this whole project 
and who contributed to most of the ideas and concepts of the edge detection applications. 
I am also very grateful to Stephane Mallat for the time he spent introducing me to the 
theory of wavelets, and for supervising all the signal processing part of this work. I received 
a lot of help from Jing Zhao and Howie Choset on this particular project. I would also 
like to thank for their patience and helpful hints : Scott Alexander, Helen Anderson, Trevor 
Darell, Doug Earhart, Alberto Izaguirre, Douglas Katzman, Sang Lee, Raymond McKendall, 
Suzanne Morin, Arup Mukherjee, Yves Schabes and Steve Sherin. I enjoyed many valuable 
discussions with Ales Leonardis. Finally I would like to thank my parents for their moral 
and financial support. 

Edited by Howie Choset. 

I am grateful to Dr. R. Bajcsy for letting me work on this project. I would like to thank 
Helen Anderson for her constant support and understanding throughout my time on this 
project. Raymond KcKendall’s assistance was both useful and educational. I am grateful 
to him for the time and effort he put into me and this project. Thank you Arup Mukherjee 
for the helping me modify Nicolas’ system. I would like to thank Scott Alexander for his 
assistance at ten to five on Friday afternoons. I would also like to thank Dr. Richard Paul 
and Dr. Anthony D’Amico for their constant moral support. Finally, I would like to thank 
James Lotkowski for his patience. 



Abstract 


The general problem of computer vision has been investigated for more than twenty years 
and is still one of the most challenging fields in artificial intelligence. Indeed taking a look 
at the human visual system can give us an idea of the complexity of any solution to the 
problem of visual recognition. This general task can be decomposed into a whole hierarchy 
of problems ranging from pixel processing to high level segmentation and complex objects 
recognition. 

Contrasting an image at different representations provides useful information such as 
edges. First, we introduce an example of low level signal and image processing using the 
theory of wavelets which provides the basis for multiresolution representation. Like the 
human brain, we use a multiorientation process which detects features independently in 
different orientation sectors. So, we end up contrasting images of the same orientation but 
of different resolutions to gather information about an image. 

We then develop an interesting image representation using energy zero crossings. We 
show that this representation is experimentally complete and leads to some higher level 
applications such as edge and corner finding, which in turn provides two basic steps to 
image segmentation. We also discuss the possibilities of feedback between different levels of 
processing. 
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1 Introduction. 


The general problem of computer vision has been investigated for more than twenty years 
and is still one of the most challenging fields in artificial intelligence. Just by taking a look 
at the human visual system can give us an idea of the complexity of any solution to the 
problem of visual recognition. This general task can be decomposed into a whole hierarchy 
of problems ranging from pixel processing to high level segmentation and complex objects 
recognition. First, we introduce an example of low level signal and image processing using 
the theory of wavelets and then develop interesting image representation. We show that this 
representation is experimentally complete and leads to some higher level applications such 
as edge and corner finding. This, in turn, provides two basic steps to image segmentation. 
We also discuss the possibilities of feedback between different levels of processing. 


2 The multiresolution concept. 

2.1 Motivation. 


The information provided by an image lies in the local variations of the image intensity. 
From the image contrast, we can extract some important features such as the edges of the 
structures embedded in the image. At the border of such structures, the image intensity is 
likely to suddenly change. The local variations have therefore a high amplitude. 

However not all of the local variations have the same relevance to the understanding of 
a scene. For example, suppose that we are looking at a far away house. Moving closer to it 
would make us distinguish successively the doors and windows, then the bricks of the walls 
and the tiles, then the texture of these bricks and tiles. Separating the details appearing at 
each resolution would enable us to establish a hierarchy between these pieces of information. 
It would also allow a scale-invariant interpretation of the image. This scale invariance is a 
well known property of the human visual system. 

As it is not possible to define a priori an optimal resolution for analyzing images, several 
researchers have developed pattern matching algorithms which process the image at different 
resolutions. In some sense, image’s details at a coarse resolution provide the ’’context” of 
the image which helps in the analysis of finer details. For example, it is difficult to recognize 
that a small rectangle inside an image is the window of a house if we did not previously 
recognize the house ’’context”. It is therefore natural to first analyze the image details at a 
coarse resolution and then increase the resolution using a coarse to fine processing strategy. 
Another interesting approach is to study details’ propagation through different resolutions. 
A feature which appears at several resolutions is certainly very noticeable in the image and 
is likely to be part of an object’s contour. A feature appearing only at finer resolutions is a 
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high frequency variation of small amplitude. It probably corresponds to some texture. On 
the other hand, a feature which appears only at coarse resolutions is a smooth variation like 
shading. 


2.2 The theory of wavelets. 

Recently Stephane Mallat has introduced such a multiresolution representation of signals 
based on the theory of wavelets [1], using a finite set of resolutions namely the powers of 2 (a 
definite advantage for implementation). The basic idea is to separate the higher half and the 
lower half of the spectrum of a signal by using a second order band-pass filter and a low pass 
filter, to subsample the image corresponding to the lower half of the spectrum and to iterate 
the process. This is roughly equivalent to dividing the spectrum in successive bands [~ ,7r], 
[ii], ... and extracting independently the details corresponding to these 

bands. Suppose that our original signal has samples. It is then said to be at resolution 2 J . 
The result of the first band-pass filtering will give us the difference of information between 
resolution 2 j and resolution 2 i_1 . The result of the next band-pass filtering will give us the 
difference of information between resolution 2 J_1 and resolution 2 J ~ 2 and so on. The theory 
of wavelets shows that we can obtain filters which are very well localized both in the Fourier 
and spatial domain — thus preserving the locality of information — and that we can achieve 
a representation which is complete. 

In order to obtain a representation of the signal which translates when the signal translate, 
Stephane Mallat [2] has designed another type of representation of the band-pass filtered 
images based on their zero- crossings and the energies between consecutive zero-crossings. 
It is called an Energy-zero-crossings representation. This representation has proved to be 
experimentally complete for signals [1] and in this article we also show that it is complete 
for images. It is the basis of the higher level applications developed in this article. 


3 The multiorientation concept. 


The purpose of a multiorientation process is to detect features independently in different 
orientation sectors. Although this increases the amount of computation, it also structures the 
data organization of the feature detection processes. Let us take a simple example. Suppose 
we carry out isotropic edge detection on an image which contains a rectangle. It would be 
pretty difficult to recognize that the closed region we have detected is actually a rectangle; 
one obstacle among others is that most edge detectors smooth corners[3]. If on the other 
hand we detect horizontal edges and vertical edges independently, and then use a composition 
process to find closed regions from the interactions between these edges, we will define an 
object with two horizontal sides two vertical sides and four corners ... obviously a rectangle ! 
In both case the result of the analysis is a closed region but by using multiorientation we 
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have learned something about the shape of the region while building it. Of course this is a 
very simplistic example and the composition processes are far from being straightforward, 
but this gives a hint on the idea of any image analysis method based on decomposition: 
gaining knowledge while recomposing. We can of course think of gaining knowledge about 
the organization of a scene by using a composition process on multiresolution as well. 


It is interesting to note that multiorientation is one of the features of the human vi- 
sion system[4]. Indeed there exist brain cells which respond specifically to stimuli within a 
certain orientation and also at a certain frequency. This means that the brain performs a 
multiresolution multiorientation decomposition of the visual input. There are as many as 30 
main directions, where the divisions are finer around the horizontal and vertical axes. There 
also exist cells which respond specifically to corners. We can guess from this that the brain 
performs some composition processing as part of the whole visual recognition system. 


4 The decomposition process. 


In this section we describe in detail the decomposition process leading to the Energy-zero- 
crossings representation. Our purpose is not to explain the theory of wavelets. The choice 
of the filters as well as certain properties like the completeness of the representation have 
very sound mathematical bases which will not be discused here. 


4.1 Monodimensional dyadic wavelet decomposition. 


Let S j be a signal at resolution 2 j and W k be the signal of the differences between S k and 
S k ~ 1 for k less than or equal to j. Let Uz be the low-pass filter with a cutoff frequency 
of f and G z be the high-pass filter with a cutoff frequency of f . From the general idea of 
multiresolution, a simple processing would be: 


1. Filter by Gz . The result is VF- 7 

2. Filter S j by Uz and sample by 2. The result is 

3. Filter S J_1 by Gz . The result is VF J_1 

4. Filter S'- 7-1 by Uz and sample by 2. The result is S'- 7-2 and so on... 
This process is summarized in Figure 1. 
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Figure 1: Conceptual process . 

The final goal of the decomposition is to obtain zero-crossings which point out some 
information to us. In other word we would like them to correspond to abrupt changes 
in the signal intensity. The energy associated with these zero-crossings would provide a 
measurement of the abruptness of the variation. The other constraint is that we want 
to build a multiresolution representation which is complete. This leads to two important 
features of the decomposition process: 


• To achieve the completeness of the Energy- zero- crossings representation, we need a 
good precision on the zero-crossings. Thus we need to expand the results of the 
consecutive band-pass filterings before computing the positions of the zeros. This is 
done by performing a spline interpolation. 

• The filters have to fit the requirements of the theory of wavelets. After computation 
they appear to perform a rather smooth cutoff. Thus we cannot sample by two the re- 
sult of the low-pass filtering without loosing information [5]. A solution to this problem 
is to wait for several iterations of the process 1 before starting the actual sampling. 


From what we have just said, supposing that we want to expand the {W l ) ie ^_ n ^ by a 
factor 4 (typical expansion factor) the algorithm becomes: 


1. Filter S * by Gz . The result is W J 

1 We call iteration the list of operations yielding the isolation of the spectrum part between 2 >+i anc i 2* 
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2. Expand W J by a factor 4 

3. Filter S j by Us . The result is S j ~ 1 (but it still has 2 J samples !) 

4. Filter 5'- 7-1 by Gi . The result is W*~ x 

5. Expand by a factor 2 

6. Filter 5 J_1 by Us . The result is 5 J-2 

7. Filter S j ~ 2 by Gi i . The result is W j ~ 2 

8. Filter S^~ 2 by Us and sample by 2. The result is S*~ 3 

9. Filter S j ~ 3 by Gs . The result is W j ~ 3 

10. Filter S^~ 3 by Us and sample by 2. The result is S^~ 4 and so on... 

This is summarized in Figure 2. In Appendix A we give the algorithm for an expansion 
factor of 2 instead of 4. 
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As we can now see, we need two other kinds of filters with a cutoff frequency of ^ and 
respectively. One way to handle this is to periodize by 2 and 4 the filters in the Fourier 
domain (i.e. inserting zeros between coefficients in the spatial domain [5]). Indeed if we 
suppose that we have a low pass filter Ux with a cut off frequency of f as shown in Figure 3. 



Figure 3: Low pass filter Ux. 

If we periodize this filter by two as shown in Figure 4 we obtain a filter U& ^ which 
behaves as the combination of a low-pass filter with a cut off frequency of \ and a high-pass 
filter with a cut off frequency of However we use this filter after having already filtered 
the signal by Ux. The spectrum of the signal is then nearly null for frequencies higher than ^ 
so the filter U 2 1 *x really behaves as a low-pass filter as its high-pass capabilities are applied 

4 9 4 

to the null part of the spectrum. 

This periodization technique reveals quite efficient and the distortion which might occur 
— due to the fact that our filters are far from being ideal — is negligible. 

We can now compute the zero- crossings and energies on the (W l ) ie y_ n -y The integration 
method we used for energy computation is based on the spline interpolation[6] and is exact 
for cubic spline functions. 
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Figure 4: Low pass filter U * periodized by 2. 

4.2 Bidimensional dyadic wavelet decomposition. 
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Figure 5: Splitting of the spectrum using wavelet filters. 

The bidimensional multiresolution multiorientation decomposition process we implemented 
reveals to be much more complex. First of all, the square shape of pixels limits immediately 
the number of possible orientations one can handle without encountering tremendous imple- 
mentation problems. Thus we chose the four main directions: horizontal, vertical and the 
two separate diagonals. This means that for each resolution we use four distinct directional 
high pass filters and a low pass filter. In terms of spectrum Figure 5 shows what parts of 
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the spectrum are isolated by each filter Gx f i, G*, 2 , G* )3 , Gi 4 and £/*. The 
filters are high-pass filters along one direction and low-pass filters along the perpendicular 
direction. The zero- crossings will be computed on the corresponding filtered images along 
the high-pass filtering direction. In order to save time and memory space, the filtered images 
will also be sampled along the low pass filtering direction. Notice that the G|, 2 and 

JJxl filters are separable. The G^ t i filter can be obtained by filtering by the one dimensional 
filter Gjl along the lines and by the one dimensional filter U* along the columns. The G-| )2 
filter can be obtained by filtering by the one dimensional filter U& along the lines and by the 
one dimensional filter G| along the columns. The two dimensional filter Uz can be obtained 
by filtering by the one dimensional filter U& along the lines and the columns. 


Of course the two constraints mentioned in section 4.1 are still in effect. Thus we must 
expand the filtered images along the high-pass filtering direction before computing the zero- 
crossings, and we must also wait for several decomposition steps before starting to sample 
along the low-pass filtering direction. To these two constraints we must add a third one 
which is that our diagonal filters G.e )3 and G^ t 4 are not separable. As a consequence, they 
must be of reasonable size. This is rendered quite complicated by the fact that they aie 
not 2x periodic in both directions. Indeed let us consider Figure 6 which shows the main 
support 2 of the fourier transform of wavelet filter G.& >3 . 



Figure 6: Diagonal filter theoretical support. 

If we mentally periodize the pattern with a period of 2,7 r along both axes we realize easily 
that there are some points of abrupt changes on the border of the dashed square. This means 
that if we perform a reverse discrete Fourier transform on this filter in order to define it m 
the spatial domain, the coefficients we obtain do not decrease rapidly at all (like those of 
a cardinal sinus). Using this kind of filter would lead to a very expensive convolution. An 
alternative to this is to shrink the spectrum of the image to make it fit between — f and f 

2 Idealy, the filter should be 1 inside the area in black and 0 elsewhere. However our filter is smooth to 
reduce its number of coefficients and to fit the requirements of the theory of wavelets. So the area in black 
corresponds rather to the filter’s main support which is the area of the Fourier space where its value is 
nearer to 1 then to 0. 
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on each axis, and to filter it with a filter whose Fourier transform’s main support is given 
by Figure 7 thus isolating the correct part of the spectrum. Shrinking the spectrum can be 
done, in terms of spatial operation, by interpolating the image by a factor 2 using a spline 
interpolation [5] . The filter of Figure 7 doesn’t have any discontinuities when you periodize 
the pattern with a period of 2 tt. As a consequence, it will have a small number of coefficients 
in the spatial domain. The coefficients of this filter are given in Appendix B. 



Figure 7: Diagonal filter actual support. 

We have to fit these three requirements within the same algorithm. Supposing we want to 
expand the band-pass filtered images by 4 for zero-crossings computation the decomposition 
is made using the following steps: 


Let S j be an image at resolution 2 j and W k W£ and be the image details at 
resolution 2 k with k less than or equal to j in the horizontal, vertical, first diagonal and second 
diagonal directions respectively. Let 4 j ^e directional band-pass filters whose 

general shapes are given in Figure 5 but which isolate the parts of the spectrum between -fi 
and - 4 L- (instead of between f and tt). Let 6U the low-pass filter of cutoff frequency ~. 


1. Expand S j by a factor 2 along both horizontal and vertical axes using spline interpola- 
tion. This will confine its spectrum between — ^ and We will refer to the resulting 
image as S%. 


2 . 


Filter S{ by the ^ an< * sample by 2 along the low-pass direction . 

results are the (kFj?) i 


The 


3. Expand the (Wjty ^ by 2 along the band pass filtering direction using spline inter- 
polation. 

4. Filter S J e by the low-pass filter Us. . We call the result S J_1 (but it still has 2 J+1 samples 
!)■ 
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5. Filter S j ~ 1 by the (Gs. k ) and sample by 4 along the low-pass direction . The 

\ 8 ’ /A:6[l--4] 

result is the {wr') k ^. A y 

6. Filter S j ~ 1 by the low-pass filter Uz and sample by 2 along each axis . The result is 

S j ~ 2 . 

7. Filter S j ~ 2 by the (Gs. k ) and sample by 4 along the low-pass direction . The 

J V 8 ’ /fc€[l..4] 

result is the {Wt~ 2 ) ks[1A] - 

8. Filter S^~ 2 by the low-pass filter Ux and sample by 2 along each axis . The result is 
S j ~ 3 and so on ... 


In Appendix A we give the algorithm for an expansion factor of 2 instead of 4. 

We can now compute the Energy-zero-crossings representation on the set of detail images 
(Wfc)(; k)e[j-n jjx[i 4 ] alon g the band-pass filtering directions. The integration we used method 
for energy computation is based on the spline interpolation and is exact for cubic spline 
functions. 


4.3 Energy-zero-crossings representation. 

Let us consider the monodimensional case first, building an energy-zero-crossings represen- 
tation from a dyadic wavelet representation consists in coding the detail signals not in terms 
of pixels but in term of zero-crossings and energies between adjacent zero-crossings. Indeed 
if the filters are properly chosen so as to make them proportional to the second derivative 
of a smoothing function, the energy- zero- crossings representation directly points out to us 
where the information is, and gives us a measure of the information’s importance. The two 
components of this representation are dealt with at different levels. First the detail signal is 
transformed into a chained list of energy patches which are a representation of the energy 
of each arch 3 of the signal. The information retained for each energy patch is the following: 


• let S be the signal and x 0 and x\ two adjacent zero-crossings. 

• let e be the energy of the arch between z 0 and x^: 

e = [S(z)Nx 

• let e be the sign of S between x 0 and X\. 

3 We call arch the portion of the signal between adjacent zero-crossings. 
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• let l the arch length defined by: 

l — X \ — Xq 

• let a be defined by: 

al — ee 2 

• then a and l are the characteristic parameter describing the energy patch. 


Then the signal representation looks like: 



Figure 8: Energy list structure. 


Having defined what an energy patch is we can now take the following structure for the 
zero- crossings: A zero-crossing is defined by its abscissa, the energy patch on its right and 
the energy patch on its left. All this is summarized in figure 9. 



Figure 9: Energy-zero-crossings structure. 


The two dimensional Energy-zero-crossings representation is a direct generalization on 
the one- dimensional case. Each detail image has a band-pass filtering direction and a low- 
pass filtering direction. We compute the energies and zero crossings along the band-pass 
filtering direction treating these lines as one dimensional signals. For the (W 1 *) ie y_ n ..j] the 
zero-crossings are computed along the lines, for the (W3) i€[y -_ n ..j] they are computed along the 
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columns, and for the (V^) i€[i _ n ^ (resp. (Wl) ie[j _ n j 3 ) they are computed along the positive 
(resp. negative) diagonal 4 . Figure 10 shows the Energy- zero-crossings representation of the 
dyadic decomposition of the image of Figure 13 (section 5.2) for the horizontal and vertical 
orientations at the first three resolutions 2 J_1 , 2^~ 2 and 2 J-3 . 


4 We call positive diagonal the diagonal that goes from bottom left to top right, and negative diagonal 
the diagonal that goes from top left to bottom right 
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Figure 10: Energy- Zero- Crossings representation (horizontal & vertical). 
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5 Completeness of the representations. 

5.1 Dyadic wavelet transform. 


Stephane Mallat has previously demonstrated that the dyadic wavelet transform was com- 
plete for signals and images. However only the mono dimensional simulation had been im- 
plemented. In this paper we show results concerning a bidimensional implementation of the 
dyadic reconstruction process. The reconstruction is based on the following property of the 
U and G filters: 

\ G ( u ;)\ 2 + \U(l;)\ 2 = 1 

or, in two dimensions: 

^2\Gi(u) X ,L;y)\ 2 + \U(u> x ,LOy)\ =1 

t=l 

This means that if S j is a signal and if we obtained S 3 ' 1 and W 3 by filtering by U and G 
respectively, the original signal can be obtained by filtering S 3 ~ l by U, by filtering W 3 by G 
and by adding up the results. This is summarized in the following equation: 

S 3 = S 3 ~ l *u + W 3 *g 

Where u and g are the impulse responses of filters U and G. In fact this is not exactly true. 
Indeed we must remember that we did certain operations of spline expansion and sampling 
to obtain S 3 " 1 and W 3 . Thus we must reverse these operations . The reverse of a spline 
interpolation can be performed using a reduction/projection as shown by Stephane Mallat [1], 
and the reverse of a decimation is an spline interpolation^]. 


5.1.1 One dimensional algorithm. 


We can now give the one dimensional reconstruction process. Supposing that we have de- 
composed signal S 3 as described in Figure 2 and obtained W 3 , W 3 , W 3 , W 3 and S 
we can reconstruct S 3 this way: 
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1. Expand S 3 ~ 4 by a factor 2. The result is S 3 ~ 4 . 

2. Filter S 3 e ~ 4 by t/i. 

3. Filter W 3 ~ 3 by G&. 

4. Add up the results. This gives S 3 ~ 3 . 

5. Expand S 3 ~ 3 by a factor 2. The result is S 3 ~ 3 . 

6. Filter S 3 e ~ 3 by Ux. 

7. Filter W 3 ~ 2 by Gx. 

8. Add up the results. This gives S 3 ~ 2 . 

9. Filter S 3 ~ 2 by U*. 

10. Reduce W 3 ~ l by a factor 2. The result is W 3 ~ d 

11. Filter WiJ by Gx. 

12. Add up the results. This gives S 3 ' 1 . 

13. Filter S 3 ~ 1 by Ux. 

14. Reduce W 3 by a factor 4. The result is W 3 ed 

15. Filter W 3 red by Gx. 

16. Add up the results. This gives S 3 . 

5.1.2 Two dimensional algorithm. 

For images, the algorithm is a little more complicated. The major obstacle to reconstruction 
is the square grid. Let us first describe the basic steps. Supposing that we have decom- 
posed signal S 3 and obtained the {^k) ke[1 4 y { W k ^ ke[1 , A y 2 ) fc€ [i.. 4 ] and 3 we Can 

reconstruct S 3 this way: 

1. expand S 3 ~ 3 by 2 along each axis. The result is S 3 ~ 3 . 

2. Expand the (W£” 2 ) fce[i 4 j by 4 along the low-pass filtering direction. The results are 
the KA- 2 )^,,/ 

3. Filter the (\V exp r 2 )* e[ i.. 4 ] by the ( G f ■*) Jte[i-4] ' and S by U >' 
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4. Add up the results. This gives S 3 ~ 2 . 

5. expand S j ~ 2 by 2 along each axis. The result is S 3 ~ 2 . 

6. Expand the (W*" 1 ) 4] 4 a l° n g the low-pass filtering direction. The results are 

the (w exp rT e|1 4 ,- € 

7. Filter the (\V exp i -1 )ib6[i..4] by the ( G f4 E [i..4] and by U *' 

8. Add up the results, this gives 5 J_1 . 

nd expand them 

A 

k Jke[i.A]' 

10. Filter the (VF„ P i), e[1 4] by the (G* ,*) te[1 .. 4] and ^ b y U 1’ 

11. Add up the results, this gives S{, which is S 3 expanded by a factor 2. 

12. Reduce S 3 by 2 along each axis. This gives S 3 . 


9. Reduce the (Wi) by 2 along the band-pass filtering direction s 

V K ; ke[i..4] 

by 2 along the low-pass filtering direction. The results are the [W exp 


The critical steps here are steps 2, 6 and 9 for the images corresponding to the two 
diagonal directions. These images have been obtained by sampling along the diagonal lines. 
Due to the square shape of the pixels, it is very difficult to interpolate again along the 
diagonal lines to reverse the sampling. The problem is that we have an array with only 
certain diagonals containing pixels, and we want to fill in the empty pixels. Let us consider 
an image sampled by four along the “negative” diagonal 5 . 

The process we have chosen is to interpolate by four along the negative diagonal, as 
shown in Figure 12 (notice that the new pixels do not fit the square grid) and to fill in the 
remaining empty pixels by linear interpolation along the positive diagonal. For example in 
Figure 12, value at pixel “c” would be equal to the mean of value at pixel “a” and value at 
pixel “b”.’ 

Of course this interpolation is not at all accurate since we perform a linear interpolation 
—which is a rather poor interpolation— along the direction containing the high frequencies. 
However, this error appears only on the diagonal lines, for which the human visual system 
has less sensitivity. Thus it is very difficult to detect this error except when looking at the 
image of the differences between the original picture and the reconstructed picture. 

5 We call positive diagonal the diagonal that goes from bottom left to top right, and negative diagonal 
the diagonal that goes from top left to bottom right 
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Figure 11: Image sampled along the negative diagonal. 

5.2 Completeness of the Energy-zero-crossings representation. 

The Energy-zero- crossings representation was proved to be experimentally complete for one 
dimensional signals. Let us remember how the reconstruction works in one dimension. Hav- 
ing decomposed a signal S - 7 into a set of details (VF*) ie y_ n ^ and computed a set of zero- 
crossings and energies (Z l ) ie y_ n j], the method is the following: 


1. Take any signal S{ . 

2. Decompose S{ into a set of details (Aj) l€[i _ n using the USUal decomposition method 
given in section 4.1. 

3. Match the zero-crossings and energies of the (X{) i€ [ J -_ n ^ with the 
is described in detail in [1]. The basic principle is: 

(a) Add a piecewise linear continuous function to each X[ in order to create zero- 
crossings at the right location. 

(b) Suppress spurious zero- crossings by adding other piecewise linear functions. 

(c) Adjust the energies by multiplying each arch between two consecutive zero-crossings 
by a proper coefficient. 

This gives the (T 1 I ) ie[j _ n .. i ]. 

4. Reconstruct a signal S J 2 from the using the usual dyadic reconstruction 

method given in section 5.1.1. 
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Figure 12: Diagonal interpolation . 


5. Decompose S 2 into a set of details 

6. Match the zero- crossings and energies of the (X 2 )i^y- n ..j] with the (-27 , )»6[j-n..j]* 

7. reconstruct a signal S 3 from the (FJ )t€[i_n..j] an< ^ so on — 


The S J k converge rapidly towards the original signal S J . In order to reduce the number of 
iterations, we take as the S{ signal the dyadic reconstruction of a set of piecewise constant 
signals (Xj) i€ ^_ n ^ which energies and zero- crossings already match the (Z*)t € [y-n..jj* Five 
or six iterations usually give a satisfactory reconstructed signal. 

The extension to images is quite immediate. We computed the energies and zero crossings 
for each line along the band-pass filtering direction treating these lines as one dimensional 
signals. Thus we can directly generalize the above mentioned algorithm to images. The 
reconstruction process works very well on images, but the only problem is that border effects 
are not totally controlled as they are in case of one dimensional processes. Figure 13 shows 
on the right the image reconstructed from the Energy-zero- crossings representation of the 
image on the left. The reconstruction took 5 iterations. The reconstruction process is quite 
accurate, and since the human visual system is very resistant to distortion, the differences are 
hardly noticeable to the naked eye. Figure 14 shows the enhanced image of the differences 
between the two images. As a comparison, the pixel values in the original image use the full 
range [0,255] whereas black and white colors in Figure 14 correspond to values -10 an TlO 
respectively. Notice that errors occur mainly along the diagonal contours at all frequencies. 
This is due to the diagonal interpolation method as explained in section 5.1.2. 
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Figure 13: Original image and reconstruction from the Energy-Zero-Crossings representation. 



Figure 14: Differences between the original image and its reconstruction. 
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6 Applications to higher level vision processes. 

6.1 Edge detection. 


The Energy-zero-crossings representation, together with the multiorientation feature, is a 
useful model for edge finding algorithms. Indeed the zero-crossings point out where local 
variations are and the energies give a measurement of the intensity of these variations. 


Let us emphasize some terminology points here. It is important to define the meaning of 
the word “direction” in what follows, supposing that we have applied a horizontal band-pass 
filter and a vertical low-pass filter to an image, the direction of the edges we will look for 
is the vertical direction (vertical edges are horizontal variations) thus we will refer to the 
vertical direction as the main edge direction whereas we will call the image an image of 
horizontal variations. 


6.1.1 Basic edge-detection algorithm. 


As a start we can simply decide that a zero-crossing belongs to an edge if the energies on 
the right and on the left of it are greater than a certain threshold. Growing an edge from 
this point is then easy because we know in which direction we must look. Figure 15 gives 
us an example of what a vertical edge looks like in terms of zero- crossings of an image of 
horizontal variations. 

Having detected in row i a zero-crossing which has sufficient energy to possibly belong 
to an edge, we look in row i + 1 for the nearest zero-crossing having the same sign 6 . Two 
situations can then occur: 


1. The energies on the right and/or left of the chosen zero-crossing are under the threshold. 
Then the edge cannot be defined further. This situation arises at the end of actual 
edges, or in textured regions where there are some energetical zero-crossing but which 
do not form organized patterns from row to row. 

2. The energies on the right and left of the chosen zero-crossing are above the threshold. 
We can chain the zero- crossings together and search row i -f 2 for the following zero- 
crossing of the edge. 


6 By convention the sign of a zero- crossing is the sign of the arch on its right 
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Figure 15: Edge detection using zero-crossings. 

This process is called edge-growing. Similarly we can investigate rows i — 1, i — 2 
and so on to extend our edge. In our implementation, we have again used the zero-crossing 
structure described in Figure 9 except that our zero-crossings are not linked along one line, 
but between consecutive lines. 


6.1.2 Algorithm improvement. 


Of course this algorithm is far too simple to give satisfactory results. First of all, if we 
use a single threshold, we often miss part of the edges we define due to contrast variations. 
However we must remember that we do not examine each zero-crossing independently to 
decide whether or not it is part of an edge. Our approach is to look for zero-crossings which 
would extend previously detected edges. Thus we use connectivity as a criterion. Having 
realized that connectivity is part of the decision process, we can then reduce the threshold 
constraint while growing the edge. So we now have two different thresholds. The first one is 
used to detect starting points for our edge growing loop, and the other one, which is weaker, 
is used while actually growing the edges. 

The next technical problem is that although now we can detect connected “chains” of 
zero-crossings, not all these chains correspond to actual edges in the image. The reason for 
this is the following: The theory of wavelet provides us with a large class of possible filters. 
We then put some requirements for the filter we choose: 


• It has to be proportional to the second derivative of a smoothing function, so that the 
zero- crossings mark inflection points. 
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• It has to be localized in space so that we can accurately locate these inflection points. 
This is also important for computation speed. 


The trade-off to this is that our filter has “bounces”, instead of being nicely smooth like for 
example the laplacian of a gaussian. Figure 17 shows the zero- crossings corresponding to two 
step edges of Figure 16 for resolutions 2- 31 and 2-^ * . We can see that for each edge there is a 
main zero-crossing and a minor one on each side. If the energies of these minor zero-crossing 
happen to be above the threshold, we will detect some echos surrounding our edge. 
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Figure 17: Zero-crossings of a step edge . 

However let e max be the maximum of the left and right energy for one of these minor 
zero-crossings, and e m j n be the minimum, the value of the ratio characteristic of the 
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edge type. If this edge is an echo, this ratio is usually 15 to 20 % depending on the filter we 
chose, whereas for real edges, its value is above 40 %. This is a simple way of discriminating 
echo edges from actual ones. Figure 18 and Figure 19 shows the edges of the image in 
Figure 13 as they appear at resolutions 2 j and 2 j_1 . Notice that some edges are redundant 
whereas some appear only at a the coarser or finer resolution. Notice also that the junction 
between orientations is smooth if the actual edge is spatially smooth (i.e. if its curvature is 
small) . 
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6.1.3 Edge finding and multiorientation. 


It is possible that an edge will appear on several images corresponding to different orientation. 
Indeed we chose filters that were localized in the spatial domain. As a consequence, their 
cut-off capacities were smothered in the frequency domain. This causes the filters to overlap 
both between orientations and resolutions. For example if while looking for vertical edges 
we detect an edge which is bent 30° from the vertical, there are great chances that this edge 
will also be detected by looking at the zero-crossings of the image of diagonal variations. In 
fact we can control the amount of redundancy by including constraints on the direction in 
the edge-growing algorithm. For example we can stop growing the edge if the next potential 
segment happens to make an angle of more than 30° with the main edge direction, as we 
expect it to be detected in another orientation anyway. This is important because we might 
want to build algorithms which link continuous segments, such as circles, having some parts 
in different orientations. 


6.2 Corner detection. 


As stated in section 3 if we use a pattern recognition process based on image decomposition 
and recomposition, multiorientation is a definite advantage for the detection of interesting 
structural points like corners, “T” junctions and so on. Figure 21 shows the horizontal and 
vertical edges of the image in Figure 20 representing a model of the Penn campus. It shows 
that the corners are not affected by our edge detection method. All we have to do is to take 
edges within two different orientations and to look for crossing points. 


The wavelet filters are localized enough to prevent ambiguities on corner detection. More- 
over, we can get accurate information about the tangents at the corner location. These corner 
points are relevant to the image understanding for several reasons. First of all, corners are 
usually the points of an object’s contour which are the most useful to define it’s geometry. 
For example, it much easier to detect motion by observing the corners than by trying to 
match edges. Also corner points are generally points of interaction between several objects 
in the scene. A “T” junction for example is a point which which plays a role in the definition 
of three different objects or regions, so there is a lot of information associated to such a 
point. Another reason is that unconnected edges very seldomly exist in real scenes. Or if 
these exist, they often correspond to a lack of information we should take care of. The best 
example of this are illusory contours. The human visual system reacts to illusory patterns 
by mentally adding some edge portions to incomplete segments so as to try to define a rec- 
ognizable shape. If we want to detect unconnected edge segments for further analysis, it 
is a good idea to first detect edges which are connected by corners, then to deal with the 
remainder. 
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Figure 20 : Image : Penn campus model. 


6.2.1 Algorithm. 


Our algorithm quite straightforward. Given L\ and L 2 two edges (i.e. chained lists of points) 
in different orientations, let p\ and P2 he the first points of L\ and L2 respectively: 


1 . move along L\ to find the point p ^ such that the distance d(pi,p 2 ) is locally minimum. 
This means that if b and a are the points before and after p' 1? both d(a,p 2 ) and d( 6 ,p 2 ) 
are greater than d(p[,p2)- 

2 . move along L2 to find the point p' 2 such that the distance d(p' 1 ,p 2 ) is locally minimum. 

3 . move along L x forward or backward from pi, depending on which direction decreases 
the distance, to find the point p" such that d(p",p' 2 ) is locally minimum. If p" equals pi 
then pi and p 2 are locally the closest points between L\ and L2 and should be examined 
as possibly defining a corner. If p” is different from pi then: 

4 . move along L 2 forward or backward from p 2 , depending on which direction decreases 
the distance, to find the point p" such that d(p",p 2 ) is locally minimum. If p 2 equals 
p 2 then Pi and p 2 are locally the closest points between L\ and L2 and should be 
examined as possibly defining a corner. If p” is different from p 2 then: 

5 . move along Li forward or backward from p'{ to find the point p"' such that d(p'" ,p 2 ) 
is locally minimum and so on... 


Once we have obtained points q\ and <72 which are the points of L\ and L2 locally closest 
to each other, we decide according to d(q u q 2 ) whether these points define a corner, if d(qi , q 2 ) 
is smaller than a certain threshold, we compute the exact position of the point where L x and 
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Figure 21: Edges of image “ Penn campus ” in 


2 orientations at resolution 2 3 1 
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L 2 cross and the tangents to the edges at this point. If the edges do not actually cross we 
extend the last segment in order to extrapolate the corner point as shown in Figure 22. 



Figure 22: Extrapolation of a corner point. 

The value of the threshold is not at all empirical. If we examine our decomposition 
process, we realize that at a any resolution , and for any orientation, the spatial resolution on 
the band- pass filtering direction is four times greater than on the low-pass filtering direction 7 . 
Let us we take the scale of the original image to compute point coordinates. The position 
uncertainity is then four times greater on the low-pass filtering direction. We take as a 
threshold the uncertainity corresponding to the low-pass filtering direction’s resolution. Let 
us take an example: Suppose that we decompose a signal S j with 2 J samples into a set of 
details (W£) ( - k)e[j . n j] x p 4 ] with an expansion factor of 4. If we extract the vertical edges 

from the zero-crossings of Wl 1 , knowing that VF/ 1 has 2 J 1 rows and 2' 7 "*’ 1 columns, we 
can expect an uncertainity of 2 on the vertical coordinates of the points which define these 
vertical edges. Similarly we can expect an uncertainity of 2 on the horizontal coordinates of 
the points of the horizontal edges. So the two points defining a possible corner can be as far 
as 2\/2 from one another due to the combination of these uncertainities. We will then take 
2y/2 as a threshold value and assume that two points define an actual corner if the distance 
between them is smaller than that. Figure 23 shows the corners detected from the edges 
shown in Figure 21. 


6.2.2 Corner resolution. 


Of course, an interesting question is to know what is the minimum angle a corner must have 
in order to be detected. Part of the answer is that a corner will automatically be detected 
if the two edges that define it appear in separate orientations. Thus the performance of our 
corner detection process is linked directly to the number of different orientation we choose 
to separate, as well as, to the amount of redundancy between them. However as we said 
before, it is very difficult to implement more than four different orientations. 

7 Or twice greater, depending on the expansion factor mentioned in section 4.1. 
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Figure 23: Comers of image “ Penn campus ” at resolution 2 J 
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An alternative could be to use only one separable filter corresponding to a horizontal 
band-pass filter and a vertical low pass filter, for example, and to apply it to several rotation 
interpolations of the original image. Of course while performing the rotation interpolation 
we will loose high frequencies. However we can still perform edge and corner detection at 
all resolutions except the first one. 


6.3 Unconnected lines detection: an example of feed-back. 


As stated earlier, unconnected line segments occur very seldom in natural scenes. However, 
many edge detectors which are not based on region growing, including ours, fail to connect 
some edges. Knowing that unconnected patterns are very unlikely, we can localize the areas 
where our edge detection process gives incomplete results, and tune the process differently 
in those areas. Also, we have the alternative of using different methods for this particular 
image portion such as using another picture with a different lighting condition, or artificially 
enhancing the contrast. 

Detecting edges with loose ends is done simply by looking at the two ends of each edge 
and to see whether they have been used to define corners. If either of them is not, it is 
considered as loose end. As we now try to make an adaptive process by locally tuning the 
parameter of our edge detector, it is useful to regroup these loose end points into clusters. 
This way we can apply some feed back at different levels. Either we change the image 
acquisition system in the area corresponding to a particular cluster, or we locally change the 
threshold and try to extend and connect the lines within a cluster. The clustering algorithm 
is based on the minimum spanning tree computation and is given in Appendix C. Results 
are shown in Figure 24. The circled area correspond to regions where the edges are generally 
less obvious. 


A Dyadic decomposition with an expansion by 2. 

A.l One dimensional algorithm. 

Let S j be a signal at resolution 2 j and W j be the signal of the differences between S J and 
S^ -1 . Let Ujl be the low-pass filter with a cutoff frequency of ^ and Gjl be the high-pass 
filter with a cutoff frequency of J-. supposing that we want to expand the {W l 2 ) ie ^_ n ^ by a 
factor 2 the algorithm is: 


1. Filter by G x . The result is W* 

2. Expand W j by a factor 2 
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3. Filter S j by Ujl . The result is S 3 ' 1 (but it still has 2 3 samples !) 

4. Filter S 3 " 1 by Gjl . The result is W 3 ~ x 

5. Filter S 3 ' 1 by Ujl and sample by 2. The result is S'- 7-2 

6. Filter S 3 ~ 2 by Gi . The result is W 3 ~ 2 

7. Filter S 3 ~ 2 by Ujl and sample by 2. The result is S 3 ~ 3 

8. Filter S 3 ~ 3 by Gjl . The result is W 3 ~ 3 

9. Filter S j ~ 3 by Ujl and sample by 2. The result is S 3 ~ 4 5 6 7 and so on... 


A. 2 Two dimensional algorithm. 


Let S 3 be an image at resolution 2 j and Wf W 2 fc W 3 fc and W 4 fc be the image details at resolution 
2 k with k inferior or equal to j in the horizontal, vertical, first diagonal and second diagonal 
directions respectively. Let (^G jl ^ ^ ^ be the directional band-pass filters whose general 

shapes are given in Figure 5 but which isolate the parts of the spectrum between and 

(instead of between and tt). Let the low-pass filter of cutoff frequency ^r. 

1. Expand S j by a factor 2 along both horizontal and vertical axes using spline interpola- 
tion. This will confine its spectrum between — f and We will refer to the resulting 
image as S 3 . 

2. Filter S 3 by the (Gjl^) and sample by 2 along the low-pass direction . The 

results are the ( W k) ke[1 A y 

3. Filter S 3 by the low-pass filter Ujl and sample by 2 along each axis . The result is S 3 " 1 
(but it has 2 3 samples). 

4. Filter S 3 ~ x by the (Gjl k ) and sample by 4 along the low-pass direction . The 

J \ 4» / /c€[ 1-4] 

result is the (»*“') fc€t i.. 4 ]- 

5. Filter S 3 " 1 by the low-pass filter Ujl and sample by 2 along each axis . The result is 
S 3 ~ 2 . 

6. Filter S j ~ 2 by the (Gjl and sample by 4 along the low-pass direction . The 

J \ 4 ’ / A;£[1..4] 

result is the (WT 2 )* 6[ i.. 4r 

7. Filter S 3 ~ 2 by the low-pass filter Ujl and sample by 2 along each axis . The result is 
S 3 ~ 3 and so on ... 
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B Coefficients of wavelet filters. 


Below are the coefficients of filters Us and Gs. This filters are symmetrical with respect 
to 0. To obtain the filters Us and Gil we periodized the filters by inserting one 0 between 
each coefficients as explained at the end of section 4.1. To obtain the filters Us and Gs we 
periodized the filters by inserting three 0 between each coefficients. 


Us filter 

2 

abscissa 

value 

0 

0.587132 

1 

0.274067 

2 

-0.056679 

3 

-0.021478 

4 

0.013405 

5 

-0.003085 


G. tl filter 

2 

abscissa 

value 

0 

-0.590628 

1 

0.267548 

2 

0.047401 


-0.020079 

4 

-0.001451 


0.002588 

6 

-0.000750 


These are the coefficients of filter Gs- 3 shown in Figure 7. This filter is symmetrical with 
respect to the point ( 0 , 0 ). To obtain the filters Gs t 3 we periodized this filter by inserting 
one 0 between each coefficient along the lines and columns. 


■ 

Gs , 3 diagonal filter 

■ 

-4 

-3 

-2 

-1 

0 

-1 

2 

3 

4 


-0.01216 

-0.00131 

-0.07572 

0.00370 

0.17657 

0.00370 

-0.07572 

-0.00131 

-0.01216 

ID 

0.0 

-0.01478 

0.00172 

0.12439 

0.00370 

-0.11448 

-0.00476 

0.01598 

0.0 

ID 

0.0 

0.0 

0.03834 

0.00172 

-0.07572 

-0.00476 

0.03815 

0.0 

0.0 

ID 

0.0 

0.0 

0.0 

-0.01478 

-0.00132 

0.01598 

0.0 

0.0 

0.0 

ID 

0.0 

0.0 

0.0 

0.0 

-0.01273 

0.0 

0.0 

0.0 

0.0 


C Clustering algorithm. 


Our clustering algorithm is designed to regroup points in roughly circular shaped clusters of 
a given maximum radius. This algorithm uses a second parameter which is the maximum 
distance between two points in a cluster. 

Let us suppose we have a set of points (Fi);e[i..n]' Let d max be the desired maximum 
distance between two points in a cluster and r max be the desired maximum radius of clusters. 
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The algorithm is the following: 


1. Compute the minimum spanning tree of the (Pi) ie [i The result is a set of arcs 

)t€:[l..m] ' 

2. Remove every arc greater than d max . This yields several sub-trees. Points which belong 

to the same sub-tree form a cluster. The result is thus a set of clusters (Ci ^ 

3. For each cluster C\°^ compute the center of gravity gi of all the points in it , then 
compute the distances between gi and each point in the cluster. Let r t - be the maximum 
of these distances. 

4. For each cluster C\°\ if r* is greater than r max then break the longest arc of the 
corresponding sub-tree. This yields two ^sub-subtrees defining two sub-clusters C * 
and Cf 

5. Repeat the two previous steps on the C\ 2 3) 4 5 6 until no further subdivision occurs. The 
result is a set of clusters C { fulfilling the desired requirements. 
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